In this paper, we formulate the collaborative multi-user wireless videotransmission problem as a multi-user Markov decision process (MUMDP) byexplicitly considering the users' heterogeneous video traffic characteristics,time-varying network conditions and the resulting dynamic coupling between thewireless users. These environment dynamics are often ignored in existingmulti-user video transmission solutions. To comply with the decentralizednature of wireless networks, we propose to decompose the MUMDP into local MDPsusing Lagrangian relaxation. Unlike in conventional multi-user videotransmission solutions stemming from the network utility maximizationframework, the proposed decomposition enables each wireless user toindividually solve its own dynamic cross-layer optimization (i.e. the localMDP) and the network coordinator to update the Lagrangian multipliers (i.e.resource prices) based on not only current, but also future resource needs ofall users, such that the long-term video quality of all users is maximized.However, solving the MUMDP requires statistical knowledge of the experiencedenvironment dynamics, which is often unavailable before transmission time. Toovercome this obstacle, we then propose a novel online learning algorithm,which allows the wireless users to update their policies in multiple statesduring one time slot. This is different from conventional learning solutions,which often update one state per time slot. The proposed learning algorithm cansignificantly improve the learning performance, thereby dramatically improvingthe video quality experienced by the wireless users over time. Our simulationresults demonstrate the efficiency of the proposed MUMDP framework as comparedto conventional multi-user video transmission solutions.
展开▼